Using Deepchecks Vision With a Few Lines of Code#

Deepchecks Vision is built to validate your data and model, however complex your model and data may be. That being said, sometime there is no need to write a full-blown ClassificationData or DetectionData. In the case of a simple classification task, there is quite a few checks that can be run writing only a few lines of code. In this tutorial, we will show you how to run all checks that do not require a model on a simple classification task.

This is ideal, for example, when receiving a new dataset for a classification task. Running these checks on the dataset before even starting with training will give you a quick idea of how the dataset looks like and what potential issues it contains.

Defining the data and model#

The data is available from the torch library. We will download and extract it to the current directory.

import urllib.request
import zipfile
import os

url = 'https://download.pytorch.org/tutorial/hymenoptera_data.zip'
urllib.request.urlretrieve(url, 'hymenoptera_data.zip')

with zipfile.ZipFile('hymenoptera_data.zip', 'r') as zip_ref:
    zip_ref.extractall('.')

# Rename val folder to test, because the simple classification task expects a test folder.
if not os.path.exists('hymenoptera_data/test'):
    os.rename('hymenoptera_data/val', 'hymenoptera_data/test')

Loading a Simple Classification Dataset#

A simple classification dataset is an image dataset structured in the following way:

  • root/
    • train/
      • class1/

        image1.jpeg

    • test/
      • class1/

        image1.jpeg

from deepchecks.vision.simple_classification_data import load_dataset

train_ds = load_dataset('hymenoptera_data', train=True, object_type='VisionData', image_extension='jpg')
test_ds = load_dataset('hymenoptera_data', train=False, object_type='VisionData', image_extension='jpg')

# Running Deepchecks' full suite
# ==============================
# That's it, we have just defined the classification data object and are ready to run the train_test_validation suite:

from deepchecks.vision.suites import train_test_validation

suite = train_test_validation()
result = suite.run(train_ds, test_ds)

Out:

Validating Input:   0%| | 0/1 [00:00<?, ? /s]

Ingesting Batches - Train Dataset:   0%|        | 0/8 [00:00<?, ? Batch/s]

Ingesting Batches - Train Dataset:  12%|#       | 1/8 [00:02<00:15,  2.20s/ Batch]

Ingesting Batches - Train Dataset:  25%|##      | 2/8 [00:06<00:18,  3.14s/ Batch]

Ingesting Batches - Train Dataset:  38%|###     | 3/8 [00:08<00:13,  2.79s/ Batch]

Ingesting Batches - Train Dataset:  50%|####    | 4/8 [00:10<00:10,  2.58s/ Batch]

Ingesting Batches - Train Dataset:  62%|#####   | 5/8 [00:12<00:07,  2.48s/ Batch]

Ingesting Batches - Train Dataset:  75%|######  | 6/8 [00:15<00:04,  2.46s/ Batch]

Ingesting Batches - Train Dataset:  88%|####### | 7/8 [00:17<00:02,  2.41s/ Batch]

Ingesting Batches - Train Dataset: 100%|########| 8/8 [00:19<00:00,  2.11s/ Batch]


Ingesting Batches - Test Dataset:   0%|     | 0/5 [00:00<?, ? Batch/s]


Ingesting Batches - Test Dataset:  20%|#    | 1/5 [00:02<00:10,  2.52s/ Batch]


Ingesting Batches - Test Dataset:  40%|##   | 2/5 [00:04<00:07,  2.46s/ Batch]


Ingesting Batches - Test Dataset:  60%|###  | 3/5 [00:07<00:04,  2.45s/ Batch]


Ingesting Batches - Test Dataset:  80%|#### | 4/5 [00:12<00:03,  3.57s/ Batch]


Ingesting Batches - Test Dataset: 100%|#####| 5/5 [00:14<00:00,  2.91s/ Batch]



Computing Checks:   0%|      | 0/6 [00:00<?, ? Check/s]



Computing Checks:   0%|      | 0/6 [00:00<?, ? Check/s, Check=Heatmap Comparison]



Computing Checks:  17%|#     | 1/6 [00:00<00:00, 21.48 Check/s, Check=Train Test Label Drift]



Computing Checks:  33%|##    | 2/6 [00:00<00:00, 24.12 Check/s, Check=Train Test Prediction Drift]



Computing Checks:  50%|###   | 3/6 [00:00<00:00, 36.07 Check/s, Check=Image Property Drift]



Computing Checks:  67%|####  | 4/6 [00:00<00:00, 11.35 Check/s, Check=Image Property Drift]



Computing Checks:  67%|####  | 4/6 [00:00<00:00, 11.35 Check/s, Check=Image Dataset Drift]



Computing Checks:  83%|##### | 5/6 [00:00<00:00, 11.35 Check/s, Check=Simple Feature Contribution]



Computing Checks: 100%|######| 6/6 [00:00<00:00,  7.62 Check/s, Check=Simple Feature Contribution]

Observing the results:#

The results can be saved as a html file with the following code:

result.save_as_html('output.html')

Or, if working inside a notebook, the output can be displayed directly by simply printing the result object:

result
Suite Output


Total running time of the script: ( 0 minutes 37.971 seconds)

Gallery generated by Sphinx-Gallery